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LOGISTIC  REGRESSION  AND  DISCRIMINANT  ANALYSIS 
BY  ORDINARY  LEAST  SQUARES 


Gus  W.  Haggstrom 


If  the  observations  for  fitting  a  polytomous  logistic  regression 
model  satisfy  certain  normality  assumptions,  the  maximum  likelihood 
estimates  of  the  regression  coefficients  are  the  discriminant  function 
estimates.  This  paper  shows  that  these  estimates,  their  unbiased 
counterparts,  and  associated  test  statistics  for  variable  selection 
can  be  calculated  using  ordinary  least  squares  regression  techniques, 
thereby  providing  a  convenient  procedure  for  performing  discriminant 
analysis  and  fitting  logistic  regression  models  in  the  normal  case. 

If  the  normality  assumptions  are  violated,  the  discriminant  function 
estimates  and  test  statistics  afford  readily  calculated  alternatives 
to  other  procedures  for  fitting  logistic  regression  models,  such  as 
the  conditional  maximum  likelihood  estimates,  that  present  theoretical 
and  computational  difficulties.  Empirical  evidence  is  provided  to  show 
that  the  results  of  fitting  logistic  regression  models  using  the  dis¬ 
criminant  function  approach  often  agree  closely  with  those  obtained  by 
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1.  INTRODUCTION 


R.  A.  Fisher  (1936)  provided  a  convenient  mnemonic  derivation  of 
the  linear  discriminant  function  based  on  samples  from  two  multivariate 
normal  distributions.  He  showed  that  a  multiple  of  the  discriminant 
function  coefficient  vector  could  be  obtained  by  fitting  a  linear 
equation  by  least  squares  using  the  components  of  the  observation 
vectors  as  independent  variables  and  a  dichotomous  dependent  variable 
to  separate  the  individuals  in  the  two  samples.  Later  it  was  shown 
that  the  t-  and  F-statistics  associated  with  this  least  squares  pro¬ 
cedure  provide  valid  tests  of  hypotheses  pertaining  to  the  discriminant 
coefficients.  This  paper  extends  these  results  to  the  case  of  three 
or  more  populations  ar*d  shows  how  the  analogous  logistic  regression 
model  in  the  normal  case  can  be  fitted  and  tested  using  least  squares 
techniques . 

The  logistic  regression  model  arises  in  quantifying  the  dependence 
of  a  polytomous  (categorical)  variable  y  on  a  q-dimensional  vector  x 
of  explanatory  variables.  Logistic  regression  (or  "logit  analysis") 
is  related  to  discriminant  analysis  in  that  the  variable  y  may  reflect 
membership  in  one  of  several  populations,  in  each  of  which  the  vector 
x  has  a  multivariate  normal  distribution. 

/■w 

To  explore  this  relationship,  we  first  consider  the  general 
classification  problem.  Suppose  that  an  individual  is  drawn  at  random 
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from  a  population  consisting  of  m  disjoint  subpopulations  tt^,  rr^, 

r  ,  and  consider  the  problem  of  classifying  the  individual  into 
one  of  the  subpopulations  on  the  basis  of  a  q -dimensional  vector  £  of 
measurements  on  this  individual.  Let  y  be  the  random  variable  having 
the  value  j  for  individuals  in  IT ^ ,  and  let  p^  =  P(y  =  j)  denote  the 
prior  probability  of  drawing  an  individual  from  tt^. 

If  the  conditional  density  of  jc  in  tt  with  respect  to  some  measure 
p.  on  is  f^(:x),  the  posterior  probability  that  the  individual  belongs 
to  TTj  given  £  is 

P(jU)  “  P(y  ■  j|x>  =  *  <i;i> 

Among  rules  for  classifying  an  individual  into  one  of  the  subpopulations 
Tij  on  the  basis  of  x,  the  rule  that  decides  y  =  i  when  p(i|x^  -  maxj  P(J 
maximizes  the  probability  of  correct  classification  (Ferguson,  1967, 
p.  292).  In  practice,  .i<e  density  functions  fj(x)  and  the  prior  proba¬ 
bilities  pj  are  usually  unknown,  and  statistical  models  are  often 
posited  in  which  the  conditional  probabilities  are  expressed  as  simple 
functions  of  parameters  that  can  be  estimated  from  a  training  set  of 
n  observations  (x^,  y^),  1  =  1»  2,  ...,  n. 

A  logistic  regression  model  for  the  pair  (x,  y)  is  characterized 
by  the  condition  that  the  probabilities  p(j|x)  are  expressible  in  the 


p(j|x)  =  exp(Yj  +  £j'x)/^=1  exp(vk  +  ^'x) 


(1.2) 


for  some  set  of  parameters  Y.  and  6.  =  (6.,,  ...,6.  )'.  In  the  dichoto- 

J  ~J  Jl  J9 

mous  case  (m  =  2),  this  reduces  to  the  binary  logistic  form 


(1.3) 


p(l  |x)  =  1/[1  +  e‘^“  +  <&  ^)]  , 
if  one  sets  a  =  Y]>  -  y2  snd  jg,  =  ^  ^ 

The  parameters  and  ^6  in  (1.2)  are  not  uniquely  determined, 
because  the  probabilities  p(j|x)  remain  unchanged  if  one  multiplies 
the  numerator  and  denominator  in  (1.2)  by  exp  (a  +  b,'x)  for  any  a  and 
Jb,.  One  way  to  specify  the  parameters  uniquely  is  to  incorporate  side 
conditions,  such  as  Z  v.  =0  and  Z  6.  =  0,  or  v  =0  and  6  =0. 

Alternatively,  one  can  specify  these  parameters  as  functions  of  other 
parameters  that  index  the  joint  distribution  of  x  and  y.  The  latter 
method  will  be  used  in  treating  the  normal  case  below. 

It  follows  from  (1.1)  that  the  pair  (x,  y)  satisfies  a  logistic 
regression  model  whenever  log[f ^(x)/fm(x)]  is  a  linear  function  in  x 
for  j  =  1,  2,  . . . ,  m  -  1.  In  particular,  this  holds  if  the  conditional 


densities  belong  to  an  exponential  family 


fj<£>  =  c<£j>  h<*>  exP(£j'£> 


(1.4) 


where  0,  is  a  q-dimensional  vector  of  parameters.  Day  and  Kerridge  (1967) 
provided  a  slightly  different  specification  by  adopting  densities  of  the 


f  (x)  =  c,  exp[-(x  -  M-,)'£,  (x  -  M* . )  /2  ]  cp(x) 


(1.5) 


for  some  q-dimensional  vector  and  some  nonsingular  q*q  covariance 
matrix  Z.  This  can  be  written  in  the  form  (1.4)  with  8.  ■  £  V. 

The  normal  case  that  gives  rise  to  a  logistic  regression  model  is 
the  case  in  which  x  has  a  multivariate  normal  distribution  N  (u..,  Z.) 
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and  the  covariance  matrices  satisfy  E,  *  . . .  =  E  =  E.  Here,  the 

r*m  /v 


density  of  x  in  tt.  is 
~  J 

fjCx)  -  (2TT)"q/2|S|"1/2  exp[-(x  -  -  j tj)/2]  .  (1.6) 

It  follows  from  (1.1)  that  the  conditional  probabilities  p(jfx)  satisfy 
a  logistic  regression  model  (1.2)  with  the  parameters  equal  to  the 
discriminant  function  coefficients 


-  £  tj  • 

Yj  -  108  Pj  -  K.j'£’V2  • 


(1.7) 


In  the  dichotomous  case,  the  parameters  of  the  binary  logistic  model 


(1.3)  are  given  by 

£  -  £_10\  -  JO 

~  ~2  (1.8) 

«-  log(Pl/p2)  *£‘(it1+^)/2  • 

In  treating  the  estimation  of  the  parameters  in  (1.7)  and  (1.8), 
we  assume  there  is  a  training  set  of  n  independent  observations 
(x^,  yt),  i  =  1,  2,  n,  such  that  for  any  pair  (jc,  y)  the  distribution 

of  x  given  y  =  j  is  N  (ja  ,  E) .  Let  n ,  be  the  number  of  observations  for 
which  y^  =  j.  Two  cases  will  be  considered: 

Case  I:  The  variables  y^  are  random  with  P(y^  =  j)  =  Pj* 

Case  II:  The  variables  y^  are  constants,  i.e.,  the  observations 
arise  from  separate  samples  of  fixed  sizes  n^,  ...,  n^  from  populations 

TT^,  .  .  .  ,  TT^. 

In  Case  I,  the  maximum  likelihood  estimators  (MLEs)  of  the  parameters 

p.,  . ,  and  £  are  the  values  that  maximize  the  likelihood  function 

-1  -*  nm  v  mn.n  v, 

i.  =  n  n  [p,f.(x,)l  J  =  n  p.  J  n  [f.(x,)]  ji  , 


t=i  j=i 


] 


(1.9) 


where  =  1  if  y^  =  j  and  =  0  otherwise.  The  likelihood  function 

for  Case  II  is  the  same  except  that  the  factors  involving  are  missing. 
In  either  case,  the  MLEs  of  the  parameters  y^  and  are  the  discriminant 

A  A 

function  estimators  obtained  by  substituting  the  MLEs  ^  and  E  in  (1.7). 
By  well-known  results  for  Case  II  (e.g.,  Anderson,  1958,  p.  248),  the 


MLE  of  is  the  sample  mean  vector 

u.  “  x.  ■  2  v  x./n.  ,  (1.10) 

*0  ~j  £  ji~i  j 

and  the  MLE  of  2  is  2  =  A/n,  where  A  is  the  pooled  sum  of  squares  and 


cross  products  matrix 


2  2  v  (x.  -  x  )(x^  -  x,)'. 

i=i  j=i  Jt  -1  ~s 


(1.11) 


If  the  values  of  p^  are  unknown  in  Case  I,  the  MLE  of  p^  is  p^  =  n^/n. 
In  Case  II,  we  shall  assume  the  p  's  are  known. 

j 

Thus,  the  MLEs  of  the  parameters  in  (1.7)  are 

-  --I— 

6,  *  Z 

:3  1  _  a.u) 

Vj  -  K>*Pj  -£j  £  TLi'1- 

In  the  dichotomous  case  (1.8),  the  MLEs  are 


i =  -  *2} 

a  =  log  (pt/p2)  -  |'(xi  +  £2)/2* 

While  a  number  of  statistical  packages  exist  for  calculating  the 


(1.13) 


discriminant  function  estimates  directly,  few  provide  test  statistics 
for  performing  variable  selection,  and  they  often  lack  the  versatility 
for  making  transformations,  deleting  variables  (or  cases),  plotting, 
and  treating  missing  values  that  linear  model  practitioners  are  accus¬ 
tomed  to.  We  now  show  how  these  estimates,  their  unbiased  counterparts, 
and  associated  test  statistics  can  be  calculated  by  applying  least  squares 


procedures  to  linear  models. 


2.  THE  DICHOTOMOUS  CASE 


For  the  case  m  =  2,  we  redefine  the  variables  to  have  values 
1  and  0,  instead  of  1  and  2,  and  let  £  =  (y^,  y^,  ...»  yft)  '  denote 
the  vector  of  dummy  variables  indicating  membership  in  tt^.  Let  a  and 
.b  denote  the  "intermediate  least  squares"  (ILS)  estimates  that  result 
from  treating  the  observations  Qc^,  y^)  as  if  they  satisfied  a  linear 
model 

?i  =  ®  +  ei  »  (2*1) 

and  let  SS  denote  the  residual  sum  of  squares  from  this  regression: 

n  2 

sse  =  S  <y i  -  a  -  b'x^)  .  (2.2) 

i=l  1 

Theorem  1.  The  MLEs  of  the  logistic  regression  coefficients  in 
(1.8)  are  related  to  the  ILS  estimates  a  and  £,  by 

i  =  Kb, 

a  ^  ^2  #3  j 

a  =  log(pL/p2)  +  K(a  -  1/2)  +  n(nL  1  -  n2  1)/2  , 

where  K  =  n/SS  . 

e 

Before  proceeding  with  the  proof,  we  first  note  that,  since 

A  A  —  m 

P1/?2  =  n^/n^  =  y(l  -  y)  in  Case  I,  these  estimates  can  be  readily 
calculated  by  hand  from  just  the  values  of  a,  n,  SSg,  and  y.  Also, 
as  will  be  seen  below,  the  standard  errors  and  t-statistics  for  the 

A 

logistic  regression  coefficients  8^  are  readily  obtained  from  those 
associated  with  the  ILS  estimates. 

In  Fisher's  original  mnemonic  procedure  for  deriving  an  unspecified 
multiple  of  the  discriminant  function  coefficients,  he  used  a  dichotomous 


dependent  variable  u  having  the  values  ^/n  and  -  n^/n  instead  of  1 
and  0.  Since  the  values  u^  are  the  centered  values  y^  -  y,  the  values 
of  Jb  and  SSg  are  the  same  for  both  choices.  Previous  writers  (Warner, 
1961;  Lachenbruch,  1975,  p.  26)  have  derived  formulas  for  the  factor  K 
in  (2.3)  that  are  not  as  readily  calculated,  given  the  value  of  SSe« 

To  prove  Theorem  1,  we  fellow  Fisher  (1938)  and  observe  that  jb 
satisfies  the  normal  equations 

Z 'Zb  =  Z'u  (2.4) 

where  Z  is  the  nxq  matrix  of  centered  values  of  the  x. .'s.  These 
equations  can  be  rewritten  in  the  form 

(A  +  cdd ' ) b  =  cd  ,  (2.5) 

where  d  =  x,  -  x„ ,  c  =  n,n„/n,  and  A  =  n£  is  the  pooled  sum  of  squares 
and  cross  products  matrix.  Thus,  Ab  =  c(l  -  b'd)d,  implying  that 

b  =  c(l  -  b'djA^d  =  p/K  (2.6) 

o/  i"w  rw 

where 

K  =  n/c(l  -  b'd)  .  (2.7) 

r+s 

Since  the  sum  of  squares  due  to  regression  is 

SS(reg)  =  Jb'jZ'u  =  clb'd^  (2.8) 

and  the  total  sum  of  squares  about  the  mean  is 

SS(tot)  =  n1(l  -  n^n)2  +  n^-n^n)2  =  c  ,  (2.9) 

the  residual  sum  of  squares  is 

SS  =  c ( 1  -  b'd)  =  n/K  . 
e  ^  ~ 

Hence,  from  (2.6)  and  (2.10),  8  *  Kb  where  K  =  n/SS  . 


(2.10 
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To  show  that  a  in  (1.12)  can  be  written  in  the  form  (2.2),  it 
suffices  to  show  that 

-  +  Xg)/2  =  K(a  -  1/2)  +  n(n1"1  -  n2_1)/2  (2.11) 

or 

b'^  +  x2)  =  -  2a  +  1  -  SSg(n1  1  -  n2  l)  .  (2.12) 

Let  x.,  ,  k  =  1,  ....  n.,  denote  the  x.  vectors  for  the  n.  observations 
~jk  J  J 

A 

from  tt j ,  and  let  denote  the  fitted  value  corresponding  to  Then 

k'ij  =  2  =  2  (^jk  “  a)/nj  = " a  +  2  yjk/nj  •  (2-13) 

Also, 

2  yik  =  i'z  "  z'z  -  <x.  -  x) 'z =  ni  “  (zmx)'(z-x)  “  ni  -  sse  (2-14) 

and  <• 

2  y2k  "  2  yi  “  2  yik  =  nl  “  (nl  “  SSe)  “  SSe*  (2>15; 

The  result  follows  from  substituting  (2. 13) -(2. 15)  in  the  left  member  of 

(2.12),  completing  the  proof  of  Theorem  1. 

It  is  well-known  that  the  F-statistic  for  testing  the  hypothesis 

H;  £  =  0  (or,  equivalently,  ^  =>  ^),  calculated  as  though  the  linear 

2 

model  (2.1)  applied  with  normally  distributed  errors  e^  ~  N(0,  a  ),  is 

2 

a  multiple  of  Hotelling's  two-sample  T  statistic 

T2  =  cd'S-1d  =  cD  2  ,  (2.16) 

o 

where  S  *  A/(n  •  2)  is  the  usual  unbiased  estimator  of  E,  and  D  is 

r>y  q 

2 

Mahalanobis'  D  statistic  (Fisher,  1938).  From  (2 . 6) —(2.8),  we  see  that 
SS(reg)  =  cb'd  =  cnd'A_1d/K  =  cD  2 (SS  )/(n  -  2).  (2.17) 

'■v  <’-»«'  ^  q  0 

Hence,  the  F-statistic  is 

F  =  (n  -  q  -  l)SS(reg)/q(SSe)  =  (n  -  q  -  l)T2/q(n  -  2). 


(2.18) 
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Under  the  assumption  that  the  observation  vectors  x^  are  sampled 
from  two  multivariate  normal  distributions  N  (p...  £) ,  j  -  1,  2,  it 

q  ~ 

follows  that  F  ~F(q,  n  -  q  -  1)  under  H  (Anderson,  1958,  p.  109). 
Lehmann  (1959)  showed  that  this  test  is  the  uniformly  most  powerful 
(UMP)  invariant  test  of  H. 

To  test  H,  :  8  0  =0,  one  can  follow  the  linear 

1  p+1  q 

model  paradigm  to  calculate  an  F-statistic  F^  comparing  SSe  with  the 
residual  sum  of  squares  SS^  calculated  after  omitting  the  last  q  -  p 
components  of  &  as  independent  variables .  Since  it  follows  from 
(2.17)  that 

SS  =  C/(l  +  CD  2)  (2.19) 

e  q 

where  C  =  c/(n  -  2),  we  see  that 

F.  -  k(SS/SS  -  1)  =  kC(D  2  -  D  2)/(l  +  CD  2)  (2.20) 

1  u)  e  q  p  p 

where  k  =  (n  -  q  -  l)/(q  -  p) .  Rao  (1946,  1948)  derived  F^  as  the 
likelihood  ratio  test  statistic  for  testing  in  the  two-sample 
multivariate  normal  case  and  showed  that  F^  ~  F(q  -  p,  n  -  q  -  1) 
under  H^.  Giri  (1964)  proved  that  Rao's  test  is  the  UMP  invariant 
similar  test  of  H^.  These  results  are  summarized  in  the  following 
theorem. 

Theorem  2.  The  F-statiscics  (2.18)  and  (2.20)  derived  from 
the  linear  model  paradigm  provide  valid  tests  of  H:  0=0  and 


•V  Vi 


-  6q  -  0. 


To  extend  the  linear  model  paradigm  further,  we  define  the 
standard  error  of  0^  and  the  t-statistic  t^  for  testing  :  0^=0 


by 


-11 


s.e.Cj^)  -  K[s.e.(bk)]  , 
fck  =  =  bk/».e.(bk) 


(2.21) 


where  s.e.(bk)  is  the  standard  error  of  bk  calculated  from  fitting 

the  yi's  to  the  x^'s  by  OLS.  Then  it  follows  from  specializing  Rao's 

result  to  the  case  p  =  q  -  1  that  tk  provides  a  valid  test  of  in 
2 

the  sense  that  tk  ~  F(l,  n  -  q  -  1) .  This  suggests,  but  does  not 
prove,  that  tk  has  a  t  distribution  under  H^.  While  this  result  will 

A  A 

be  proved  below,  it  is  not  true  that  -  Pk)/s.e.(Pk>  has  a  t  distri¬ 
bution  when  ^  ^  0.  The  problems  of  providing  unbiased  estimators  and 
confidence  intervals  for  0.  will  be  treated  later. 


3.  THE  POLYTOMOUS  CASE 


The  development  above  for  the  dichotomous  case  provides  little 

insight  as  to  why  following  the  linear  model  paradigm  might  lead  to 

valid  tests  and  estimates  for  the  logistic  regression  model.  To  further 

illuminate  the  dichotomous  case  and  provide  a  basis  for  establishing 

analogous  results  for  the  polytomous  case,  we  begin  by  considering  how 

the  parameters  of  the  logistic  regression  model  are  related  to  certain 

regression  coefficients  whose  MLEs  are  least-squares  estimators. 

In  the  logistic  regression  model  given  by  (1.2),  the  conditional 

probabilities  p(j  [jx)  are  specified  in  terms  of  parameters  y^  and  £ 

that  are  functions  of  p  ,  jj^,  and  £.  A  second  parameterization  that  is 

more  convenient  for  this  development  results  from  dividing  the  numerator 

and  denominator  of  (1.2)  by  exp(y  +  &  'x)  and  setting  ot.  =  y  -  y 

m  'Mn  j  j  m 

and  8.  =  6.  -  6  .  This  yields  the  parameterization 

p(j|x)  =  exp(CKj  +  J3j'x)/[1  +  exp(o?k  +  j^'x)] 

for  j  =  1,  2,  ...,  m-1;  (3.1) 

p(m|x)  =  1  -  2?"}  P(j|x). 

It  follows  from  (1.7)  that 


B  =  £~  (p..  -  p.  ) 

a.  =  log(p./pm)  -  £.'(£.  +  %J/2. 


(3.2) 


Letting  the  (i,j)  elements  of  £  and  £  ^  be  denoted  by  and  a  ^  in 

the  sequel,  we  recall  that,  if  x  ~  N  (jj^,  £) ,  then  the  conditional  distribution 

^  kk. 

of  x^,  given  the  other  components  of  x,  is  N(§^k  +  )  where  0^ 
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is  a  q -dimensional  vector  with  0^  =  0,  9^ 


for  i  f  k,  and 


§  =  (A -  0 ,  V..  (See  Anderson,  1958,  pp.  28,  42.)  The  "constant  terms" 

Jk  Jk  k  J 

§  ,  k  =  1,  2,  ....  q,  are  related  to  the  components  of  the  logistic 

jk 

regression  coefficient  vectors  6.  =  £  p..,  since  the  k  component  of  6.  is 

°  '"'“''J 


c  ^  ki  kk.  -  »  .  kk 

6  =  L  a  p.  -  a  (p...  -  9,  p..)  =  a  g 

jk  i  j i  jk  ~k  ~J  ? 


Hence,  the  components  of  J8.  are  given  by 


_  K.K.  _  v 

ejk  ■  a  (5jk  ■  W • 


(3.3) 


(3.4) 


Let  v, ,  v„ ,  ....  v  denote  the  indicator  variables  for  the  subpopula- 
12  m 

tions  tt,  ,  tt„,  ....  tt  .  Then  it  follows  from  the  above  that  the  components 
12m 

x^  of  the  observation  vectors  x  satisfy  the  linear  model 


x,  =  I??  .  5  ..  v .  +  9.  ’x  +  e. 
k  j=l  jk  j  ~  k 

-  §  ,  +  9,  'x  +  S?"}  (5  ..  -  §  .  )v .  +  e. 
mk  ~k  ~  j=l  V3jk  mk  j  k 

kk 

where  e^  ~  N(0,  1/a  ).  By  relabeling  v^,  ...,  v^  as 

this  can  be  rewritten  in  the  form 

x,  =  §  ,  +  9.  'x  +Xf?^9.  ,x  +  e, 
k  mk  ~  j=l  k,q+j  q+j  k 


(3.5) 


where 


ek,q+j  '  §jk  “  §mk  Pjk/a  * 


(3.6) 


(3.7) 


By  reexpressing  each  of  the  joint  densities  f.(x)  in  the  likelihood 

J  — 

function  (1.9)  as  a  product  of  the  conditional  density  of  x^  times  the 

joint  density  of  the  other  components  of  ?c,  we  see  that  the  MLEs  of  the 

regression  coefficients  in  (3.6)  are  the  least-squares  estimators,  and 

kk 

the  MLE  of  the  conditional  variance  1/a  is  the  residual  sum  of  squares 


SS(x^)  divided  by  n.  As  is  well  known  (e.g.,  Rao,  1965,  p.  224),  these 
estimators  can  be  obtained  by  inverting  the  augmented  sample  covariance 


matrix  S  with  elements 


5  -  E  (x  -  x . ) (x .  -  x.)/n 

ij  v  iv  i/v  jv  j 


(3.8) 


for  i,  j-1,  2,  . ..,  q  +  m  -  1.  The  MLEs  are  given  in  terms  of  the 
elements  s^  of  S  ^  by 

/■NW  * 


9kj  -  sk^/skk  for  j  f  k, 

akk  =  skk  =  n/SS(xk). 


(3.9) 


Hence,  by  (3.7) 

=  w/k  -  - sk,q+J  <3-10> 

for  j  =  1,  2,  ...»  m-1.  Noting  that  the  last  member  of  (3.10)  can  also  be 


obtained  by  using  Xq+j  (=  vj)  38  the  regressand,  we  obtain  that 


V  -  -  8<1+J'k  -  Vj.k  8’+J'q+J  '  <3'U> 

where  b^k  =  9^+j  k  is  the  regression  coefficient  on  x^  when  is  regressed 
linearly  on  x^,  ...,  x^  and  the  other  vk'8»  and  SS^  is  the  residual  sum  of 
squares . 

This  development  indicates  that  the  linear  model  paradigm  for  the 

dichotomous  case  can  be  extended  to  the  polytomous  case.  The  procedure 

consists  of  first  fitting  the  observations  (x. ,  v...  ....  v  .  .) 

li  m-l,i 

by  least  squares  as  if  they  satisfied  the  linear  model 

v  =  Qf.  +  B.'x.+  s  Y.,v.  .  +  e  . .  (3.12) 

Ji  J  jk  ki  ji 

for  each  j.  This  provides  ILS  estimates  a.  and  b.  of  a.  and  0.,  as  well 

J  ~J  J  'V 

as  the  residual  sum  of  squares  SS^,  which  can  then  be  transformed  to  yield 
the  MLEs.  The  process  can  be  sunmarized  as  follows: 
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m 


Theorem  3 .  The  MLEs  of  the  polytomous  logistic  regression  coefficients 


(3.2)  are  related  to  the  ILS  estimates  a.  and  b.  by 

J  ~J 

1-  =  K.  b,  , 

J  J  ~j  (3.13) 

=  log(Pj/Pm)  +  Kj(aj  -  1/2)  +  n^"1  -  nm_1)/2  , 

where  K.  =  n/SS  . 

J  J 

/\ 

The  formula  for  is  a  restatement  of  (3.11).  The  derivation  of  the 

A 

formula  for  is  similar  to  the  proof  for  the  dichotomous  case.  See 
(2.11)  to  (2.15). 

As  in  the  dichotomous  case,  these  formulas  for  the  MLEs  apply  whether 
(1)  the  individuals  are  sampled  at  random  from  the  population  consisting 
of  m  subpopulations  tt^,  ...,  tt^,  or  (2)  the  observations  arise  from  separate 
samples  of  fixed  sizes  n^,  . . . ,  nffl  from  rr^,  ...,  tt  .  In  the  first  case, 
the  MLEs  of  the  p^ 's  are  p^  =  n^/n;  in  the  second,  the  p^'s  are  assumed  to 
be  known. 

Next  we  consider  whether  the  t-statistics  derived  from  the  linear  model 

paradigm  provide  valid  tests  of  the  hypotheses  H:  (3^  =  0.  Since  p  = 
kk. 

°  q+j  (3-7),  H  is  equivalent  to  the  hypothesis  that  =  0 

in  (3.6).  Under  the  Case  II  (separate  sample)  assumptions,  the  UMP  unbiased 


test  of  H  is  based  on  the  t-statistic 


t  =  0.  ,  ,/s  .e.  (9.  )  , 

k,q+j  k,q+j 


(3.14) 


which  has  a  t(n  -m-q+1)  distribution  under  H.  The  analogous  t-statistic 


when  v.  is  regressed  linearly  on  x. ,  x„ . x  and  the  other  v,  's  is 

J  1  2  q  k 


t  -  b.k/s.e.(b.k)  , 


(3.15) 


where  the  standard  error  in  the  denominator  is  calculated  as  if  (3.12) 
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applied  with  the  error  terms  satisfying  the  usual  linear  model  assump¬ 
tions.  To  see  that  these  two  t-statistics  are  identical,  it  suffices 
to  recall  that  both  can  be  calculated  from  the  sample  partial  correlation 
coefficient  r^k  c  between  and  using  the  formula 

c  -rJk.c))1/2  •  «-16> 

where  v=n-m-q+l. 

In  concluding  that  the  t-statistic  (3.15)  has  the  same  properties 
as  those  cited  for  the  one  in  (3.14),  one  must  recognize  that  these 
properties  depend  on  the  Case  II  assumptions.  In  Case  I,  the  conclusions 
need  qualification,  because  the  n^'s  are  random  variables  that  can  be 
zero  with  positive  probability.  While  these  results  and  others  to  follow 
can  be  restated  as  conditional  results  given  any  nonzero  values  of  the 
Uj's  for  which  n  >  m  +  q  -  1,  we  shall  simply  assume  that  the  Case  II 
assumptions  apply  with  fixed  nonzero  sample  sizes  n^ .  With  this  proviso, 
the  validity  of  the  t-statistic  (3.15)  can  be  stated  as  follows: 

Theorem  4.  The  t-statistic  (3.15)  for  testing  H:  =  0  derived 

from  fitting  (3.12)  by  ordinary  least  squares  has  a  t(v)  distribution 
under  H,  and  rejecting  H  for  |t|  >  Provl-des  the  UMP  unbiased 

test  of  size  Of. 


If  we  define  the  standard  error  of  (3^  using 
s.e.(Pjk)  =  Kj[s .e . (bjk)]  , 


(3.17) 


then  t  =  &.1./3  .e  •  (0 .,  )  provides  a  valid  t-statistic  for  testing  whether 
0 ,,  -  0.  However,  it  is  not  the  case  that  (0.,  -  3  .«,)/s  .e .  (0 ,.  )  has  a 

jk  jk  jk  jk 


Student's  t  distribution  except  when  Bj^  ■  0.  The  problem  of  providing 
an  approximate  pivotal  quantity  for  Bj^  will  be  treated  below  after 
considering  the  bias  of  the  MLEs. 

a  a 

By  (1.12),  alternative  formulas  for  Qf.  and  B.  are  given  by 

J  “Tj 

lj  "  £  %  *  im*-  (3.18) 

“j  *  log<Pj/P.)  '  -  V/2  • 

where  Q j  =  '£  *x .  Das  Gupta  (1968)  examined  the  moments  and  asymptotic 

distribution  of  the  discriminant  function  coefficients  B  in  the  dichotomous 
case.  A  key  result  in  his  derivation  is  that,  if  A  has  a  Wishart  distri¬ 
bution  W^(E,  N)  ,  then  E(A  *)  =  2  V(N  -  q  -  1).  Applying  this  result 
to  jjj  and  observing  that  the  matrix  A  =  nS  in  (1.11)  has  a  W^(Z,  n  -  m) 

/V 

distribution  and  is  independent  of  the  mean  vectors,  we  see  that  E(£j)  = 
nJ3j/(v  -  2).  Hence,  an  unbiased  estimator  of  jij  is 

Ij  ■  -  2)1/-  *  C/j  (3.19) 


where  Cj  =  (n  -  m  -  q  -  1)SS  . 

A 

To  remove  the  bias  in  a j ,  first  note  that 

E(Q  )  =  E[E(x  .'S"1*,  |x.)]  =  nE(x,  'S~^x . )/ (v  -  2) 

j  ~j  ~  ~j  ~J  ~  ~J  (3.20) 

=  n[  (q/nj)  +  -  2)  . 

It  follows  that  an  unbiased  estimator  of  is 

=  iog(p7pm)  -  [(v  -  2)  (Qj  -  Qm)/n  -  q(nj_1  -  nm_1)]/2.  (3.21) 

By  (3.13),  this  can  also  be  written  in  the  form 

“  log(Pj/Pm)  +  Cj  (a^  -  1/2)  +  (n  -  m  -  l)(nj"‘1  -  nm’L)/2.  (3.22) 

The  unbiased  estimators  in  (3.19)  and  (3.22)  are  functions  of  the 


sample  mean  vectors  x,  and  the  pooled  sample  covariance  matrix  E,  which 
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are  sufficient  statistics  under  the  Case  II  assumptions .  Moreover, 

•—  —  a 

5m*  £)  *8  complete,  as  can  be  shown  by  a  proof  analogous  to 
that  given  in  the  one-sample  case  (Anderson,  1958,  p.  117).  It  follows 
from  the  Lehmann-Scheffe  Theorem  that  a.  and  satisfy  the  following 
optimality  property. 

Theorem  5.  The  estimators  c?j  and  |>  given  in  (3.22)  and  (3.19) 
are  the  uniformly  minimum  variance  unbiased  estimators  of  Oj  and  . 


If  one  defines  the  standard  error  of  p  using 
s.e.(pjj^)  —  C  [ s .@ •  (b jj^) ]  , 


(3.23) 


then  t  *  8,./s.e.(8  ,  )  has  a  t(v)  distribution  when  8  .  ■  0  by  Theorem  4. 


Although  the  pivotal  quantity 


(Pjk  “ 


(3.24) 


only  has  a  t(v)  distribution  when  =  0,  it  can  still  be  used  as  an 
approximate  pivotal  quantity  for  generating  confidence  intervals.  This 
quantity  is  closely  related  to  a  bona  fide  pivotal  quantity  having  a 
t(v)  distribution  suggested  by  the  model  (3.6),  namely, 

'  ■  <V,+J  -  •  <>•“> 

By  (3.7),  (3.10),  (3.14),  and  (3.15),  this  can  be  rewritten  in  the  form 


t  =  <§jk/skk  -  &jk/<?kk)/ (Kj/skk)s  .e.  (bj^) 
=  (GPjk  -  ejk)/G[s.e.(8jk)]  , 


(3.26) 


Ir  lr  kk.  kk 

where  G  =  nc  /vs  =  SS(xk)/v(l/cr  ).  Noting  that  SS(xk)/v  is  the 

kk 

usual  unbiased  estimator  of  l/o  , (the  conditional  variance  of  x^  given 
the  other  components  of  x),  we  see  that  E(G)  =  1  and  Var(G)  =  2/v.  Hence, 
for  moderately  large  values  of  v,  omitting  the  factors  G  in  (3.26)  and 
using  (3.24)  instead  should  provide  good  approximations. 


4.  MATRIX  FORMULATION 


Let  the  augmented  sample  covariance  matrix  =  (s^j)  defined  in 


(3.8)  be  partitioned  into 


*11 

Sl2 

~21 

£22 

(4.1) 


where  is  the  sample  covariance  matrix  of  x^,  ....  x^,  and  S ^ 

the  sample  covariance  matrix  of  v,,  ...»  v  , .  Then 

1  m-i 

_  /  -11.2  “  ^11.2  £12^2  \ 

S.  "  -  -1-  -  -1  .22  )  ’  (4' 


-  s  mls  s  ~1 

xil. 2  XI2X22 


-  S  'LS  S  _1 

,  *22  -21-11.2 


(4.2) 


where 


-11.2  ^.1  "  -12^22  ^21  » 

~2  '  hi''  *  h2'\lhl.2'1h2h2'1 


(4.3) 


The  submatrices  of  £  in  (4.2)  have  interesting  Interpretations 
in  discriminant  analysis  and  logistic  regression.  By  (3.10),  the  ele¬ 
ments  in  the  upper  right-hand  corner  are  the  negatives  of  the  discriminant 

A 

coefficient  estimates  8^*  This  might  have  been  deduced  from  (4.2)  and 
(3.18)  by  recognizing  that  ^  is  t*ie  matrl-x  least-squares  re¬ 

gression  coefficients  of  the  components  of  x  on  v^,  ...,  vm_^>  an^  S,j^  2 
is  the  residual  sum  of  squares  and  cross-products  matrix  (Anderson,  1958, 
p.  81).  Since  the  relevant  regression  equations  are  of  the  form  (3.5) 
except  that  the  term  9^'x,  is  missing,  it  follows  that  the  columns  of 
S,.S„„  1  are  the  vectors  x,  -  x  ,  j  =  1,  2,  . ..,  m  -  1,  and  St,  0  =  £. 


Hence,  if  IB  is  defined  to  be  the  qx(m  -  1)  matrix  having  as 


its  j  column,  then  it  follows  that 


(4.4 


22 

By  (3.9)  and  (3.13),  the  diagonal  elements  of  £  are  the  multiples 
Kj  =  n/SSj  used  in  converting  the  ILS  estimates  to  the  MLEs. 

A 

Clearly,  the  estimates  0^  and  their  test  statistics  can  be 
calculated  directly  from  S  ^ .  Following  the  linear  model  paradigm  is 


simply  a  mnemonic  technique  for  adopting  standard  least-squares  pro¬ 
cedures  to  Isolate  the  appropriate  elements  of  S  \ 
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5.  EFFICIENCY  AND  ROBUSTNESS 


Logistic  regression  is  often  applied  in  situations  in  which  the 
normality  assumptions  are  known  to  be  violated,  e.g.,  in  cases  in  which 
one  or  more  of  the  independent  variables  are  dichotomous.  Several 
authors  (e.g..  Press  and  Wilson,  1978)  have  recommended  against  the 


rare  instances  when  the  normality  assumptions  apply. 


A  commonly  recommended  alternative  to  the  discriminant  function 
estimators  when  the  normality  assumptions  do  not  apply  are  the  condi¬ 
tional  maximum  likelihood  estimators  (CMLEs).  These  estimators  are 
defined  as  the  values  of  a.  and  8.  that  maximize  the  conditional  like- 

j 

lihood  function 


l  =  n  nJ 

C  j=l  i-l 

where  p(j |x)  =  P(y  =  j |x)  is  given  by  (3.1)  in  the  polytomous  case  and 

(1.3)  in  the  dichotomous  case.  Of  course,  the  CMLEs  are  the  maximum 

likelihood  estimators  if  the  x^'s  are  constant  vectors  or  if  the  marginal 

distributions  of  the  x. 's  do  not  depend  on  a.  and  8,»  but  even  in  these 

~i  J  ~J 

cases  the  rationale  for  adopting  the  CMLEs  in  practice  is  unclear. 

Under  the  normality  assumptions  imposed  in  the  preceding  sections, 
one  would  expect  that  the  discriminant  estimators,  being  the  unconditional 
MLEs ,  would  perform  at  least  as  well  as  the  CMLEs  in  large  samples. 

Efron  (1975)  confirmed  this  in  the  dichotomous  case  and  showed  that  the 


[p(j 


v 


ij 


(5.1) 
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asymptotic  efficiency  of  the  CMLEs  decreases  markedly  as  the  Mahalanobis 
distance  between  the  mean  vectors  and  ^  increases. 

Despite  this  lack  of  efficiency  in  the  normal  case,  it  is  often 
contended  that  the  CMLEs  are  preferable  because  they  are  more  robust 
in  the  nonnormal  case.  In  any  case,  the  CMLEs  raise  thorny  computational 
and  theoretical  problems,  and  there  may  be  some  difficulty  in  determining 
whether  the  CMLEs  exist  for  a  given  sample  in  the  polytomous  case.  In 
the  dichotomous  case,  the  CMLEs  do  not  exist  if  there  is  some  linear 
combination  d'x  such  that  the  values  d'x,  for  those  individuals  having 
y^  ■  1  are  all  larger  (or  smaller)  than  the  corresponding  values  of 
those  individuals  for  which  y^  *  0.  If  the  CMLEs  exist,  they  are  often 
calculated  using  an  iterative  procedure,  such  as  the  method  introduced  by 
Walker  and  Duncan  (1967),  that  may  require  a  number  of  passes  through 
the  data.  Test  statistics  associated  with  the  CMLEs  are  based  on  asymp¬ 
totic  properties  of  MLEs  that  are  of  questionable  validity  in  small  samples. 

While  the  t-statistics  associated  with  the  discriminant  function 
estimators  provide  exact  tests  when  the  normality  assumptions  apply, 
the  robustness  of  these  statistics  is  open  to  question  when  the  normality 
assumptions  are  violated.  To  provide  some  evidence  on  this  score,  con¬ 
sider  the  case  in  which  there  is  just  a  single  independent  variable  x  in 
the  dichotomous  logistic  model.  It  follows  from  the  identification  of 
the  t-statistics  (3.14)  and  (3.15)  that  the  statistic  t  =  0/s.e.(@) 

associated  with  the  coefficient  3  on  x  is  the  ordinary  two-sample  t-statistic 


t  =  (xL  -  x2)/[S2(n1"1  +  n2”L)] 1/2  . 


(5.2) 


r« 
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As  Is  well  known,  two-sided  tests  and  confidence  intervals  based  on 
this  statistic  are  quite  robust  to  departures  from  normality,  even 
when  the  x^'s  are  dichotomous. 

A  case  that  would  seem  to  favor  the  CMLEs  is  the  case  in  which  the 
x^'s  are  constant  vectors.  Berkson  (1955)  made  a  thorough  study  of  the 
performance  of  the  CMLEs  (here,  MLEs)  in  bio-assay  situations  where  the 
x^'s  are  preassigned  dosage  levels.  He  showed  that  his  minimum  logit 
chi-square  estimators  and  the  minimum  chi-square  estimators  perform 
considerably  better  than  the  MLEs  in  applications  of  this  type,  even 
if  one  excludes  the  cases  in  which  the  MLEs  fall  to  exist.  The  poor 
performance  of  the  CMLEs  in  this  case  as  well  as  the  normal  case  raises 
questions  about  the  widespread  use  of  the  CMLEs  in  practice. 

This  is  not  to  say  that  the  discriminant  function  estimators  would 
perform  any  better  in  bio-assay  situations.  Halperin,  Blackwelder,  and 
Verter  (1971)  provide  compelling  arguments  to  effectively  eliminate  the 
discriminant  function  estimators  as  contenders  in  applications  in  which 
the  independent  variables  are  dichotomous.  However,  in  cases  like  these, 
the  data  lend  themselves  to  grouping,  so  that  one  can  use  the  readily 
calculated  minimum  logit  chi-square  estimators.  The  procedure  involves 
first  transforming  the  group  means  using  Berkson' s  logit  transformation 
or  a  modified  version  recommended  by  Anscombe  (1956)  and  then  fitting 
the  transformed  values  using  weighted  least  squares.  For  an  excellent 
discussion  of  these  methods,  see  Cox  (1970). 


In  applications  where  some  or  all  of  the  independent  variables 
are  continuous,  the  discriminant  function  estimators  merit  wider  use 
both  in  exploratory  work  associated  with  fitting  logistic  regression 
models  and  as  alternatives  to  (as  well  as  first  approximations  for) 
the  CMLEs.  The  main  reason  for  these  recommendations  stems  from  an 
empirical  observation--the  two  methods  ordinarily  yield  comparable 
results  in  practice.  Both  sets  of  estimates,  their  standard  errors, 
and  their  t-statistics  have  been  calculated  for  numerous  data  sets 
emanating  from  research  studies  at  The  Rand  Corporation  since  1974 
when  the  formulas  (2.3)  for  the  dichotomous  case  were  first  derived 
(Haggstrom,  1974).  Almost  without  exception,  the  results  from  applying 
the  two  procedures  have  been  interchangeable  for  most  practical  purposes 
in  that  corresponding  pairs  of  estimates  typically  differ  by  less  than 
a  standard  error  (no  matter  which  standard  error  is  used),  and  the 
t-statistics  for  the  two  procedures  are  usually  quite  close. 

In  their  excellent  paper  comparing  the  two  estimation  techniques, 
Halperin  et  al.  reported  the  results  of  using  both  procedures  in  fitting 
several  data  sets  that  included  both  continuous  and  discrete  independent 
variables.  For  the  most  part,  their  results  confirmed  the  close  agreement 
of  the  estimation  procedures,  although  they  reported  slightly  better  fits 
using  the  CMLEs.  They  found  that  the  absolute  values  of  the  t-statistics 
associated  with  the  CMLEs  tended  to  be  slightly  smaller  than  those  for 
the  discriminant  function  estimates,  but  this  may  have  resulted  from 
their  using  a  different  t-statistic  from  the  one  defined  in  (2.21). 


The  observation  that  the  two  estimation  procedures  tend  to  yield 
comparable  results  (even  in  cases  where  the  appropriateness  of  the 
logistic  regression  model  is  suspect)  Indicates  that,  whatever  robust¬ 
ness  properties  the  estimators  have  to  nonnormality  and  misspeciflcatlons 
of  the  regression  functions,  the  procedures  seem  to  share  those  properties 
in  situations  where  the  CMLEs  exist  and  some  of  the  Independent  variables 
are  continuous.  Since  neither  procedure  has  been  shown  to  have  a  decided 
advantage  based  on  theoretical  grounds  (except  perhaps  in  the  normal 
case),  it  seems  only  reasonable  to  opt  for  the  computational  facility 
of  the  discriminant  function  estimators,  especially  in  exploratory  work 
with  large  data  sets . 

Another  consideration  that  favors  the  use  of  the  discriminant 
function  estimators  in  some  applications  is  that,  unlike  the  CMLEs, 
they  are  readily  adapted  to  handling  missing  values.  As  was  seen  in 
Section  4,  the  discriminant  function  estimates  can  be  calculated  directly 
from  the  augmented  sample  covariance  matrix.  In  the  missing  values 
case,  one  can  mimic  a  common  procedure  for  handling  missing  values  in 


linear  models  by  simply  substituting  estimates  of  the  elements  s 


ij 


using  observations  on  complete  pairs.  Alternative  procedures  and  soft¬ 


ware  for  carrying  out  this  process  is  provided  in  BMDP-79  (Dixon  and 
Brown,  1979,  Chapter  12).  Chow  (1979)  discusses  this  and  other  tech¬ 
niques  for  treating  the  missing  value  problem  in  logistic  regression. 


6.  A  NUMERICAL  EXAMPLE 


As  an  example  Co  Illustrate  how  a  polytomous  logistic  regression 
model  can  be  fitted  by  ordinary  least  squares,  we  report  an  analysis 
of  300  observations  on  participants  in  the  National  Longitudinal  Study 
of  the  High  School  Class  of  1972.  The  scores  and  x^  are  the  seniors' 
Scholastic  Aptitude  Test  scores  (verbal  and  quantitative)  divided  by 
100,  and  v^,  v^,  and  v^  are  indicator  variables  for  three  categories 
of  postsecondary  activities:  (1)  College  attendance,  (2)  military 
service,  and  (3)  other.  Some  summary  statistics  for  comparing  the 
three  groups  are  given  in  Table  1. 

Table  1 

SUMMARY  STATISTICS 


Groups 

n 

Means 

X1  X2 

Std. 

X1 

dvn. 

X2 

r(xL,x?) 

1 

169 

4.79 

5.29 

1.14 

1.17 

0.69 

2 

20 

4.30 

4.61 

0.94 

0.96 

0.52 

3 

111 

4.20 

4.69 

0.96 

1.07 

0.56 

Combined 

300 

4.54 

5.02 

1.10 

1.15 

0.66 

The  equations  below  were  fitted  to  the  observations  by  ordinary 
least  squares: 

v  =  -  1.0023  +  .0719x.  +  .0551x  -  .5610v 

1  (2.23)  (1.79)  Z  (-5.27)  Z 

v.  =  .1525  +  .0131x.  -  .0118x  -  .1531v  . 

2  (0.77)  L  (-0.73)  Z  (-5.27) 

! 
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The  quantities  in  parentheses  beneath  the  ILS  estimates  are  the  t-statistics 

'  -  V-  *(bjk)‘  The  multipliers  for  transforming  the  ILS  estimates 

to  the  MLEs  of  the  logistic  regression  coefficients  (3.2)  are  = 

n/SS^  =  300/61.96  =  4.842  and  K2  =  n/SS2  =  300/16.91  =  17.74.  The 

values  of  the  discriminant  function  estimates  determined  from  (3.13) 

are  reported  in  Table  2,  along  with  the  corresponding  values  of  the 

CMLEs  calculated  from  the  same  data  set. 


Table  2 

ESTIMATES  OF  THE  LOGISTIC  REGRESSION  COEFFICIENTS 


Discriminant 

Cond .  maximum 

function  estimates 

likelihood 

Coeff.  s.e.  t 

Coeff.  s.e.  t 

Group  1  (College) 

Constant 

X1 

X2 

-2.476 

.348 

.267 

.156 

.149 

2.23 

1.79 

-2.491 

.352 

.268 

.157 

.148 

2.24 

1.81 

Group  2  (Military) 

Constant 

-1.731 

-1.747 

X1 

.232 

.300 

.77 

.235 

.302 

• 

00 

X2 

-.209 

.286 

-.73 

-.207 

.286 

-.72 

In  this  particular  case,  the  agreement  between  the  discriminant 
function  estimates  and  the  CMLEs  was  remarkably  close.  While  good  agree¬ 
ment  between  the  two  sets  of  estimates  and  their  t-statistics  is  expected, 
this  level  of  agreement  is  unusual. 


To  illustrate  that  the  discriminant  function  estimates  and  their 
t-statistics  can  be  determined  directly  from  the  augmented  sample 
covariance  matrix  £  for  x^,  x^,  v^,  and  v2»  the  matrix  and  its  inverse 
are  given  below: 


1.2029 

.8388 

.1415 

-.0158' 

.8388 

1.3298 

.1489 

-.0275 

-.1415 

.1489 

.2460 

-.0376 

-.0158 

-.0275 

-.0376 

.0622 

1.5093 

-.9179 

-.3482 

-.2324* 

.9179 

1.3652 

-.2665 

.2089 

-.3482 

-.2665 

4.8417 

2.7168 

-.2324 

.2089 

2.7168 

17.7453. 

The  negatives  of  the  discriminant  function  estimates  appear  in 
the  upper  right-hand  corner  of  jjT"1,  and  the  values  of  =  n/SSj  are 
the  last  two  diagonal  elements.  The  t-statistics  for  the  discriminant 
function  estimates  can  be  determined  by  first  calculating  the  partial 
correlation  coefficients  r..  =  -  s^/[s^s^]^^  and  then  applying 

JK  «C 

formula  (3 . 16) . 

a  a 

The  t-statistics  in  Table  2  associated  with  82^  anc*  $22  can 

used  to  test  the  hypotheses  that  =  ®  anc*  $22  =  Alternatively, 

2 

Hotelling's  T  could  be  used  to  test  the  hypothesis  that  £2  = 

Given  that  these  hypotheses  are  accepted,  one  might  choose  to  combine 
Groups  2  and  3,  thereby  reducing  the  polytomous  case  to  the  dichotomous 
case.  The  requisite  calculations  for  fitting  the  dichotomous  model 


can  be  performed  directly  by  inverting  the  appropriate  3x3  submatrix  of  £ 


REFERENCES 


Anderson,  T.  W.,  Introduction  to  Multivariate  Statistical  Analysis, 
John  Wiley  &  Sons,  Inc.,  New  York,  1958. 

Anscombe,  F.  J.,  "On  Estimating  Binomial  Response  Relations," 
Biometrika,  Vol.  43  (1956),  pp.  461-464. 

2 

Berkson,  Joseph,  "Maximum  Likelihood  and  Minimum  x  Estimates 
of  the  Logistic  Function,"  Journal  of  the  American  Statistical  Associa¬ 
tion.  Vol.  50  (1955),  pp.  130-162. 

Chow,  Winston  K.,  "A  Look  at  Various  Estimators  in  Logistic  Models 
in  the  Presence  of  Missing  Values,"  The  American  Statistical  Association 
1979  Proceedings  of  the  Business  and  Economic  Statistics  Section,  pp. 
417-420. 

Cox,  D.  R.,  Analysis  of  Binary  Data.  Chapman  and  Hall,  Ltd.,  London, 

1970. 


Das  Gupta,  S.,  "Some  Aspects  of  Discrimination  Function  Coefficients, 
Sankhva*.  Series  A,  Vol.  30  (1968),  pp.  387-400. 

Day,  N.  E.,  and  D.  F.  Kerr id ge,  "A  General  Maximum  Likelihood  Dis¬ 
criminant,"  Biometrics .  Vol.  23  (1967),  pp.  313-323. 

Dixon,  W.  J.,  and  M.  B.  Brown  (eds.),  BMDP-79:  Biomedical  Computer 
Programs .  P-Series .  University  of  California  Press,  Berkeley,  1979. 

Efron,  Bradley,  '*The  Efficiency  of  Logistic  Regression  Compared  to 
Normal  Discriminant  Analysis,"  Journal  of  the  American  Statistical 
Association.  Vol.  70  (1975),  pp.  892-898. 

Ferguson,  T.  S.,  Mathematical  Statistics:  A  Decision  Theoretic 
Approach .  John  Wiley  &  Sons,  Inc.,  New  York,  1967. 

Fisher,  R.  A.,  "The  Use  of  Multiple  Measurements  in  Taxonomic 
Problems,"  Annals  of  Eugenics.  Vol.  7  (1936),  pp.  179-188. 

Fisher,  R.  A.,  "The  Statistical  Utilization  of  Multiple  Measurements, 
Annals  of  Eugenics.  Vol.  8  (1938),  pp.  376-386. 

Giri,  N.,  "On  the  Likelihood  Ratio  Test  of  a  Normal  Multivariate 
Testing  Problem,"  Annals  of  Mathematical  Statistics.  Vol.  35  (1964), 
pp.  181-190. 

Haggstrom,  G.  W.,  "Notes  on  Logistic  Regression  and  Discriminant 
Analysis,"  The  Rand  Corporation,  1974,  unpublished. 


Halperin,  Max,  W.  C.  Blackwelder,  and  J.  I.  Verter,  Estimation 
of  the  ^ltivariate  Logistic  Risk  Function:  A  Comparison  of  the 
Discriminant  Function  and  Maximum  Likelihood  Approaches,"  Journal  of 
Chronic  Diseases.  Vol.  24  (1971),  pp.  125-158. 

Lachenbruch,  P.  A.,  Discriminant  Analysis.  Hafner  Press,  New  York 

1975. 

Lehmann,  E.  L.,  Testing  Statistical  Hypotheses.  John  Wiley  &  Sons 
Inc.,  New  York,  1959. 

Press,  S.  J.,  and  Sandra  Wilson,  "Choosing  Between  Logistic 
Regression  and  Discriminant  Analysis,"  Journal  of  the  American  Statis¬ 
tical  Association.  Vol.  73  (1978),  pp.  699-705. 

Rao,  C.  R.,  "Tests  with  Discriminant  Functions  in  Multivariate 
Analysis,"  Sankhvff.  Vol.  7  (1946),  pp.  407-414. 

Rao,  C.  R.,  "Tests  of  Significance  in  Multivariate  Analysis," 
Biometrika.  Vol.  35  (1948),  pp.  58-87. 

Rao,  C.  R.,  Linear  Statistical  Inference  and  Its  Applications. 
John  Wiley  &  Sons,  Inc.,  New  York,  1965. 

Walker,  Strother  H.,  and  David  B.  Duncan,  "Estimation  of  the 
Probability  of  an  Event  as  a  Function  of  Several  Independent  Variables, 
Biometrika.  Vol.  54  (1967),  pp.  167-169. 

Warner,  Stanley  L.,  Stochastic  Choice  of  Mode  in  Urban  Travel: 

A  Study  in  Binary  Choice.  Northwestern  University  Press,  Evanston, 
Illinois ,  1962 


